Influence of Study Type on Twitter Activity for Medical Research Papers

نویسندگان

  • Jens Peter Andersen
  • Stefanie Haustein
چکیده

Twitter has been identified as one of the most popular and promising altmetrics data sources, as it possibly reflects a broader use of research articles by the general public. Several factors, such as document age, scientific discipline, number of authors and document type, have been shown to affect the number of tweets received by scientific documents. The particular meaning of tweets mentioning scholarly papers is, however, not entirely understood and their validity as impact indicators debatable. This study contributes to the understanding of factors influencing Twitter popularity of medical papers investigating differences between medical study types. 162,830 documents indexed in Embase to a medical study type have been analysed for the study type specific tweet frequency. Metaanalyses, systematic reviews and clinical trials were found to be tweeted substantially more frequently than other study types, while all basic research received less attention than the average. The findings correspond well with clinical evidence hierarchies. It is suggested that interest from laymen and patients may be a factor in the observed effects. Conference Topic Altmetrics Introduction In the context of altmetrics, defined as “the study and use of scholarly impact measures based on activity in online tools and environments” (Priem, 2014, p. 266), Twitter has been identified as one of the most interesting and widely-used data sources (Costas, Zahedi, & Wouters, 2014; Thelwall, Haustein, Larivière, & Sugimoto, 2013). Although restricted by brevity—a tweet is limited to 140 characters—Twitter is at the heart of the altmetrics idea to enable a broader scope for impact assessment beyond citation impact. As Twitter is used widely and particularly outside of academia by currently 284 million monthly active users, tweets mentioning scientific papers are hoped to capture use by the general public and thus societal impact. Initially suggested as predictors of future citations and thus early indicators of scientific impact (Eysenbach, 2011), more recent large-scale empirical studies suggest that tweets are more likely to reflect online visibility including some social and scientific impact but also self-promotion and buzz (Costas et al., 2014; Haustein, Larivière, Thelwall, Amyot, & Peters, 2014; Haustein, Peters, Sugimoto, Thelwall, & Larivière, 2014). The most tweeted documents seem to attract a lot of online attention rather due to humorous or curious topics than their scientific contributions, often fitting “the usual trilogy of sex, drugs, and rock and roll” (Neylon, 2014, para. 6). Various, mostly quantitative, studies have shown, with respect to scientific papers, that—after the reference manager Mendeley—Twitter is the altmetrics data source with the second-largest prevalence and it is constantly increasing to currently more than one fifth of 2012 papers being tweeted (Haustein, Costas, & Larivière, 2015). Correlation studies provide evidence that tweets and citations measure different things (for example, Costas et al., 2014; Haustein, Larivière, et 1 https://about.twitter.com/company al., 2014; Haustein, Peters, et al., 2014; Priem, Piwowar, & Hemminger, 2012; Thelwall et al., 2013; Zahedi, Costas, & Wouters, 2014). The latest research shows that Spearman correlations with citations for 2012 papers in Web of Science are low at ρ=0.194 for all 1.3 million papers and ρ=0.148 excluding untweeted papers. Beyond the particular differences of Twitter coverage and density between scientific disciplines, research fields and journals reported by various studies (Costas et al., 2014; Haustein, Larivière, et al., 2014; Haustein, Peters, et al., 2014; Zahedi et al., 2014), Haustein et al. (2015) also identified large variations between document types deviating from patterns known for citations. For example, news items and editorial material, which are usually considered non-citable items (Martyn & Gilchrist, 1968), are the most popular types of journal publications on Twitter, showing a tendency of increasing Twitter impact for brief and condensed document types. A study based on a random sample of 270 tweets to scientific papers found that the majority of tweets contained either the paper title or a summary, did not attribute authorship and had a neutral sentiment, while 7% were self-citations (Thelwall, Tsou, Weingart, Holmberg, & Haustein, 2013). Other findings suggest that automated diffusion of article links on Twitter plays a role as well (Haustein, Bowman, et al., 2015). Although these findings provide more evidence that the mechanisms behind tweeting a paper are different from those citing it, the meaning of tweets to scientific papers as well as the role of Twitter in scholarly communication are still unclear, not in the least due to the difficulty to identify ‘tweeter motivations’ based on 140 characters. This study aims to contribute to a better understanding of tweets as impact metrics by analysing the type of content that is distributed on Twitter. We propose that certain types of articles appeal more to the public than others, for example, because of their potential impact on health issues and everyday life or due to the fact that they are written in a certain way. Previous research has suggested that certain medical study types have a larger citation potential than others (Andersen & Schneider, 2011; Kjaergard & Gluud, 2002; Patsopoulos, Analatos, & Ioannidis, 2005), likely because they are more useful to the research community. In the context of Twitter, medical papers are of particular interest, because, on the one hand, these are particularly relevant to general Twitter users—as opposed to, for example, physics research—and practicing physicians belong to early adopters of social media in their work practice (Berger, 2009). In a survey asking researchers about social media use in research, the uptake by health scientists was, however, slightly below average (Rowlands, Nicholas, Russell, Canty, & Watkinson, 2011). The aim of this paper is thus to investigate whether there is a connection between different medical study types and the frequency of tweets per article. We hypothesize that some study types are more popular on Twitter due to their attractiveness for a broader audience such as applied medical research relevant to patients as well as meta-analyses summarizing research and condensing results. We will approach this hypothesis by first investigating the potential differences in tweet frequency for a range of medical study types. We argue that logically there should be a connection between the clinical evidence hierarchy (further explained below) and the types of studies patients might consider interesting to discuss or spread on social media, as the highest evidence levels are those which are most likely to affect clinical practice. We therefore expect differences in tweet frequency to be related to evidence levels. Materials and Methods Comparing the impact of medical research study types on Twitter requires two pieces of information per research article: a classification of the study type as well as the number of tweets received by each particular paper. Currently no database contains both pieces of information, so that it was necessary to combine data from different sources. For this purpose, the medical study type classifications from the Embase bibliographical database was used, enriched with metadata from PubMed and Web of Science and then matched to Twitter data from Altmetric.com. The datasets and the matching approach are described in further detail below. Following these descriptions is an account of the specific measurements and statistical tools employed as well as the limitations of this study. Data collection and matching Due to Twitter’s 140 character limitation, mentions of a scientific paper in tweets are restricted to links to the publisher’s homepage or unique document identifiers such as the Digitial Object Identifier (DOI) or PubMed ID (PMID). As Twitter only provides access to the most recent tweets, it is necessary to constantly query various article identifiers to obtain a database of tweets to scientific papers. Altmetric LLP has been collecting tweets based on multiple document identifiers including the DOI, PMID and the publisher’s URL since July 2011 and thus provides a valuable data source for the purposes of our study. To assure reliable and complete Twitter data, we focus our study on papers published 2012. In order to link all tweets to the bibliographic data and study type classification from Embase, the DOI and the PMID are needed. The study type classifications (see below) for the analysis were retrieved from the Embase bibliographical database. Embase is a major database containing more bibliographical records than PubMed Medline; for example, 24% more for documents published in 2012. It is unclear whether the study type classifications of either database outperforms the other, however, as the indexing of Embase is more exhaustive, we have chosen to use this database for our study. In order to identify relevant papers from Embase (and to be able to perform a citation analysis in the future), Clinical Medicine journals were selected from the Web of Science (WoS) based on the National Science Foundation (NSF) journal classification system. The Web of Science also provides bibliographic data and DOIs for the relevant papers which were used to match Embase study types and tweets from Altmetric. Embase was queried for the relevant journals using the journal name and various abbreviations as well as the ISSN. Limiting the results to papers published in 2012, the metadata of 593,974 records was retrieved from Embase. In order to obtain the PMID needed to match tweets, PubMed was queried in the same way resulting in 497,619 records. Embase, PubMed and Web of Science were matched using the DOI, PubMed as well as string matches of bibliographic information resulting in 238,560 documents in the final dataset, 94.9% of which with a PMID and 91.1% with a DOI. The bibliographic metadata was matched to the Altmetric database using the DOI and PMID resulting in 80,116 records with at least one social media event as captured by Altmetric and 74,060 with at least one tweet at the time of data collection in August 2014. This amounts to 31% of the 238,560 being mentioned on Twitter at least once, which corresponds almost exactly to the Twitter coverage of biomedical & health sciences papers found by Haustein, Costas and Larivière (2015). To ensure comparability between tweets published in January and December 2012, we fixed the tweeting window to 18 months (546 days) for each of the tweeted documents, including tweets until 30 June 2013 for papers published on 1 January 2012 and until 30 June 2014 for papers published on 31 December 2012. The day of publication is based 2 Twitter’s REST API is limited to tweets from the previous week, while the Streaming API provides realtime data only. 3 For the publication year 2012, Embase contains 1,334,356 records (search: “2012”.yr) and PubMed Medline contains 1,072,384 (search: 2012[pdat]). on the publication date provided by Altmetric. As this date is not available for all records and is sometimes incorrect, the dataset was further reduced to 52,911 documents, which had an Altmetric publication date in 2012 and not received a tweet before the publication date. Although these steps lead to an underestimate of the percentage of tweeted papers, they help to reduce biases induced by publication age when comparing the visibility of different medical study types on Twitter. Medical study type classification Embase indexes all articles using a controlled vocabulary (the Emtree thesaurus), which contains hierarchically ordered keywords in a classical thesaurus structure. Among these keywords are study type classifications, of which some are directly identifiable as such (e.g. randomised controlled trials), while others require some translation (e.g. “sensitivity and specificity” which is used for diagnostic accuracy studies). The Emtree thesaurus is designed for indexing and retrieval, and there is thus not a given connection between the hierarchical ordering of study type keywords and different levels of research methodology. This is particularly important, as one of the predominant approaches to Western medical research and practice is the so-called evidence based medicine (EBM). One of the cornerstones of EBM is the distinction between study types and their hierarchical ordering based on how much ‘evidence’ a study is assumed to contribute to the understanding of a given problem (Greenhalgh, 2010). Different hierarchies exist, e.g. the Oxford Centre for Evidence Based Medicine’s “Levels of Evidence” (OCEBM Levels of Evidence Working Group, 2011). Table 1. Medical study type classification system based on Röhrig et al (2009) and OECBM. Classifications with raised numerals have narrower terms which are not shown here. We have chosen to use a particular hierarchy which allows a classification of study types on their level of research (Röhrig et al., 2009). We have added to the classification of Röhrig et al. (2009) by adding classification codes and the corresponding keywords in Emtree. The resulting system has been validated by two field-experts, and is displayed in Table 1. As can be seen, the classification system allows direct translation between specific Emtree keywords (we have added the broadest terms as well as their relevant narrower terms) and our classification codes on the third level (study_type). The system allows grouping of study types into classes and research types (levels 2 and 1), thus allowing us to analyse the connection between tweets and the specific study types as well as the broader categories. Of the entire population of 238,560 records, 162,830 records can be classified using our study type classification system. Of these, 36,595 (22.5%) receive at least one tweet within the fixed 18 months tweet window. Of the remaining 75,730 records without a classification, 16,316 (21.5%) receive at least one tweet. These data delimitations will be used to control for systematic errors in our main dataset (records with classifications). Among those that were classified, 55% had only one classification, 26% had two, 12% had three and the remaining 7% had four or more classifications. References with n classifications are treated as n observations, thus resulting in more than 162,830 observations on either classification level. Some classes in our classification system were not observed at all in the dataset. These classes are omitted in the results section. Statistical methods and indicators For each study type classification level we report several statistics for all documents (referred to by *A, e.g. NA) as well as the subset that has received at least one tweet (*T). The included statistics are number of articles per classification (N), mean tweets per article (μ), the standard deviation from the mean (σ), percentage of articles with at least one tweet (NT/NA), and the mean normalised tweets (?̂?) defined as the ratio between μ for a specific classification and μ for the entire population. As the distributions of tweets for any classification are extremely skewed (see results) similar to citations, the adequacy of the mean as an indicator of average activity is debatable (Calver & Bradley, 2009). However, while the median might be a methodologically more sound choice, the distributions are so extremely skewed that for study type level classification, medians are all 0 when all papers are included and either 1 or 2 if only tweeted papers are included. The corresponding means range from 0.35 to 1.74 and 2.02 to 5.01, providing considerably more information, especially as the scales for the mean are continuous. We therefore use the mean for comparisons, with due care and inclusion of standard deviations and percentage of tweeted articles to provide further information on differences in means. As we have large sample sizes, we expect any major differences in means to be real and not due to chance. However, to test this assumption, all classifications are tested pairwise and against the background population using the independent sample, unpaired Mann-Whitney test.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Twitter Is Studied in the Medical Professions: A Classification of Twitter Papers Indexed in PubMed

BACKGROUND Since their inception, Twitter and related microblogging systems have provided a rich source of information for researchers and have attracted interest in their affordances and use. Since 2009 PubMed has included 123 journal articles on medicine and Twitter, but no overview exists as to how the field uses Twitter in research. OBJECTIVE This paper aims to identify published work rel...

متن کامل

Effect of Ink Formulation and Paper Surface Morphology on Ink-jet Printing Properties

The purpose of this study was to investigate the effect of type of alcohols and humectants in inks formulation, and the papers surface morphology on the ink-jet printing properties. In order to investigate the influence of alcohol and humectant types on printing properties, the optimum ink formulation from previous study, which contained C.I. Reactive Blue 21 Ink2 was formulated with different ...

متن کامل

What do people study when they study Twitter? Classifying Twitter related academic papers

(2013) What people study when they study Twitter: classifying Twitter related academic papers. It is advisable to refer to the publisher's version if you intend to cite from the work. All outputs in CentAUR are protected by Intellectual Property Rights law, including copyright law. Copyright and IPR is retained by the creators or other copyright holders. Terms and conditions for use of this mat...

متن کامل

The online attention to certain nuclear medicine topics: An altmetrics study vs. a citation analysis

Introduction: Traditional citation analysis has been greatly criticized because the process of citation accumulation requires considerable time after publication. So, the term “altmetrics” was proposed in 2010 to measure the scientific and social impact of a paper.We performed a search for certain nuclear medicine topics using the altmetrics approach to report the correlation b...

متن کامل

Quantifying the Twitter Influence of Third Party Commercial Entities versus Healthcare Providers in Thirteen Medical Conferences from 2011 – 2013

INTRODUCTION Twitter channels are increasingly popular at medical conferences. Many groups, including healthcare providers and third party entities (e.g., pharmaceutical or medical device companies) use these channels to communicate with one another. These channels are unregulated and can allow third party commercial entities to exert an equal or greater amount of Twitter influence than healthc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1507.00154  شماره 

صفحات  -

تاریخ انتشار 2015